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Multiscale Inference for High-Frequency Data 

Adam Sykulski, Sofia C. Olhede and Grigorios A. Pavliotis 



Abstract 

This paper proposes a novel multiscale estimator for the integrated volatility of an Ito process, in the presence 
of market microstructure noise (observation error). The multiscale structure of the observed process is represented 
frequency-by-frequency and the concept of the multiscale ratio is introduced to quantify the bias in the realized 
Q>^ ■ integrated volatility due to the observation error The multiscale ratio is estimated from a single sample path, and a 

\ frequency-by-frequency bias correction procedure is proposed, which simultaneously reduces variance. We extend 

■ the method to include correlated observation errors and provide the implied time domain form of the estimation 

procedure. The new method is implemented to estimate the integrated volatility for the Heston and other models, 
and the improved performance of our method over existing methods is illustrated by simulation studies. 
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Index Terms 

Bias correction; market microstructure noise; realized volatility; multiscale inference; Whittle likelihood. 



■ I. Introduction 

Over the last few decades there has been an explosion of available data in diverse areas such as econometrics, 
atmosphere/ocean science and molecular biology. It is essential to use this available data when developing and 
testing mathematical models in physics, finance, biology and other disciplines. It is imperative, therefore, to develop 
accurate and efficient methods for making statistical inference in a parametric as well as a non-parametric setting. 
Many interesting phenomena in the sciences are inherently multiscale in the sense that there is an abundance 
^ of characteristic temporal and spatial scales. It is quite often the case that a simplified, coarse-grained model is 
used to describe the essential features of the problem under investigation. Available data is then used to estimate 
0\ . parameters in this reduced model [12], [17], [18]. This renders the problem of statistical inference quite subtle, 
2 ■ since the simplified models that are being used are compatible with the data only at sufficiently large scales. In 
particular, it is not clear how and if the high frequency data that is available should be used in the statistical 
inference procedure. 

5o ! On the other hand in many applications such as econometrics [32] and oceanography [13] the observed data 
O is contaminated by high frequency observation error. Statistical inference for data with a multiscale structure and 
I> for data contaminated by high frequency noise share common features. In particular the main difficulty in both 
k> problems is that the model that we wish to fit the data to is not compatible with the data at all scales. This is an 
5_( example of a model misspecification problem [20, p. 192]. 



Parametric and non-parametric estimation for systems with multiple scales and/or the usage of high frequency 
data has been studied quite extensively in the last few years for the two different types of models. First, the 
problem of estimating the integrated stochastic volatility in the presence of high frequency observation noise has 
been considered by various authors [1], [36]. Similar models and inference problems have also been studied in 
the context of oceanic transport [13]. It was assumed in [1], [36] that the observed process consists of two parts, 
an Ito process Xt (i.e. the solution of an SDE, which is a semimartingale) whose integrated stochastic volatility 
(quadratic variation {Xt,Xt)) we want to estimate, and a high frequency noise component et- 

Yt^=Xt^+et^. (1) 

{^tj}^=i are the sampled observations. The additional noise {£t^}^^i was used to model market microstructure. 

It was shown for the model of ([T]) that using high frequency data leads to asymptotically biased estimators. In 
particular if all available data is used for the estimation of the quadratic variation of Xt then the realized integrated 
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volatility [Yt, Yt] converges to the aggregated variance of the differenced observation noise. Subsampling is therefore 
necessary for the accurate estimation of the integrated volatility. An algorithm for estimating the integrated volatility 
which consists of subsampling at an optimal sampling rate combined with averaging and an appropriate debiasing 
step was proposed in [1], [36]. Various other estimators were suggested in [11], [15], [32], [36] for processes 
contaminated by high frequency nuisance structure. 

Secondly, parameter estimation for fast/slow systems of SDEs for which a limiting SDE for the slow variable 
can be rigorously shown to exist was studied in [24]-[26]. In these papers the problem of making inferences for 
the parameters of the limiting (coarse-grained) SDE for the slow variable from observed data generated by the 
fast/slow system was examined. It was shown that the maximum likelihood estimator is asymptotically biased. In 
order to correctly estimate the parameters in the drift and the diffusion coefficient of the coarse-grained model from 
observations of the slow/fast system using maximum likelihood, subsampling at an appropriate rate is necessary. 
The subsampling rate depends on the ratio between the characteristic time scales of the fast and slow variables. A 
similar problem, with no explicit scale separation, was studied in [7]. 

All of the papers mentioned above propose inference methods in the time domain. Yet, it would seem natural 
to analyse multiscale and high frequency properties of the data in the frequency domain. Most of the time domain 
methods can be put in a unified framework as linear filtering techniques, i.e. as a convolution with a linear kernel, 
of some time-domain quadratic function of the data. The understanding of these methods is enhanced by studying 
them directly in the frequency domain, as convolutions in time are multiplications in frequency. Fourier domain 
estimators of the integrated volatility have been proposed for observations devoid of microstructure features, see 
[2], [14], [21]. Fourier domain estimators have also been used for estimating noisy Ito processes (i.e. processes of 
the form[T]), see [22], [29], [30], based on smoothing the time domain quantities by using only a limited number 
of frequencies in the reconstruction. 

The bias in the realized integrated volatility of the observed process Yt. due to the observation noise et- can be 
understood directly in the frequency domain, since the energy associated with each frequency is contaminated by 
the microstructure noise process. This bias is particularly damaging at high frequencies. In this article we propose a 
frequency-by-frequency de-biasing procedure to improve the accuracy of the estimation of the integrated volatility. 
The proposed estimation method can also be viewed in the time domain as smoothing the estimated autocovariance 
of the increments of the process, but where the implied time domain smoothing kernel is itself estimated from the 
observed process. 

In this paper we will consider a regularly sampled Ito process with additive white noise et- superimposed upon 
it at each observation point tj, cf (dJ. The Ito process satisfies an SDE of the form 

dXt = iitdt + atdBt, Xq = xq. (2) 

Bt denotes a standard one dimensional Brownian motion and ^t, crt are (in general) Ito processes, see for example 
the Heston model which is studied in Section |llll The Brownian motions driving the three Ito processes can be 
correlated. The observations and the process are related through 

Yt^=Xt^+et^, i = l,2,...,iV + l, tj:=^^T={j-l)At. (3) 

We assume the data is regularly spaced. The length of the path T is fixed. The additive noise {£tj}fJi^ is initially 
taken to be a white noise process with variance a^, and it is assumed to be independent of the noise that drives 
the Ito process Xt. Our main objective is to estimate the integrated volatility, {X, X)t = Jq erf dt of the Ito 

process {Xt], from the set of observations {Yt-^^^^ . In the absence of market microstructure noise (i.e., when 
Yt- = Xt^ , j = 1, . . . , + 1) the integrated volatility can be estimated from the realized integrated volatility of 
the process {Yt} [32]. In the presence of market microstructure noise this is no longer true, see also [36], and a 
different estimation procedure is necessary. 

The proposed estimator can be described roughly as follows. Let { } denote the Discrete Fourier Transform 
(DFT) of the differenced sampled Xt process, and similarly for Yt- and £t- . The integrated volatility can be written 
in terms of the inverse DFT of the variance of J^; . We calculate the bias in the variance of , when using 
its sample estimator to estimate the variance of . The high frequency coefficients are heavily contaminated by 
the microstructure noise. With a formula for the bias it is possible to debias the estimated variance of the Fourier 
transform at every frequency, with the unknown parameters of the bias estimated using the Whittle UkeUhood [34], 
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[35]. This produces a debiased estimator of the integrated volatility via an aggregation of the estimated variance, 
and we show also that the variance of the proposed estimator is reduced by the debiasing. 

Our estimator shows highly competitive mean square error performance; it also has several advantages over 
existing estimators. First, it is robust with respect to the signal to noise ratio; furthermore, it is easy to formulate 
and to implement; in addition, it readily generalizes to the case of correlated observation errors (in time). Finally, 
the properties of our estimator are transparent using frequency domain analysis. 

The rest of the paper is organized as follows. In Section |ll] we introduce our estimator and present some of its 
properties, stated in Theorems [J and [2l We also discuss the time-domain understanding of the proposed method 
and the extension of the method to the case where the observation noise is correlated. In Section |lll] we present 
the results of Monte Carlo simulations for our estimator. Section |IV] is reserved for conclusions. Various technical 
results are included in the appendices. 



II. Estimation Methods 

Let {itj} be given by dS]), where the noise {stj} is independent of {^tj}, is zero-mean and its variance at 
any time is equal to o"^. The simplest estimator of the integrated volatility of Xt would ignore the high frequency 
component of the data and use the realized integrated volatility of the observed process. The realized integrated 
volatility is given by 

(b) ^ / 1 \ 

{Xx)r = [Y, Y]^ = (^*... - ^*^)' = ^ • 
j=l V / 

This estimator is both inconsistent and biased, see [15]. For comparative purposes, we define also the realized 
integrated volatility of the sampled process {Xt^}: 

(u) ^ 

{Xx)t = [X, X]t ^ Yl -^tf = {NAt) = 0{1). (5) 

i=i 

This cannot be used in practice as Xt is not directly observed. Both these are estimators of the integrated volatility 
(quadratic variation) of X. 



A. Fourier Domain Properties 

We shall start by deriving an alternative representation of ^ to motivate further development. Firstly we define the 
increment process of a sample from a generic time series Ut^, j = 1, . . . 1 by AUt^ = Ut^^-, —Ut^, j = 1, . . . N, 
and then the discrete Fourier Transform of AUt- by J^^'* as by [27] [p. 206] 



fk = ^, U = X,Y,e. 



(Y) 

Our proposed estimator will be based on examining the second order properties of {J} }. 



odogram [5] defined for a time series and is an inefficient estimator of varjj^^''} = S^^f^' . Firstly we examine the 
properties of {jI^^}- We have, with Jlj = f(j^\)At '^^ denoting the local average of fit, 



J, 



(6) 



is the peri- 



AXt^ 

'J u 



pAt nAt 

/ [fisds + asdWs]=]IjAt+ cTsdWs, 
J(j-l)At J(.i-l)At 

N 



jAt 
(i-l)At 





pjAt 


AtJIj + 


/ asdWs 


J{j-l)At 



a^dW^e 



(7) 



j^^J(j-l)At 
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We define 



J, 



(X) 



1 Jij-mt 



(X) 

and this to leading order approximates as At ^ for all but a few frequencies. We can also note that, since 
fig is an Ito process, it has almost surely continuous paths, which implies that 

2 



N-l 



k= 
N-l 



N 



k=0 



i=i 



E 



1 



AtiV//c 



AtiV/A; 



= O (At) 

O flog(At)VAt) , 



(8) 



(9) 



as At fijC ^^'^ « = O . So we only need, to leading order, calculate X^^g^ 
the properties of X^^q^ "^i^^ I frorn dSjl and (|9ll. More formally we note that 



J, 



0(1) when calculating 



Af-l 



k=0 



N-l 



E|4^^^ = EI^''^^+oM^^)^ 



k=0 



We need to determine the first and second order structure of {J^ j^. In general {J^ is a complex-valued 
random vector, which may not be a sample from a multivariate Gaussian distribution. The covariance matrix of a 
complex random vector Z is given by cov {Z, Z} = E {ZZ^} - E {Z} E {Z}^ [23], [28]. We have 



E 



{in 



0, k = l,...,N - 1. 



Furthermore, with = E { J^f ^^f 



'^ki,k2 



nAt 



E 



agdWsatdWte 



-2-(^ 



N N J 



^J{n-l)At ^^-^ J(l-l)At 
N rnAt N AAt 



T-tY E / E{a,dW,atdt^ae-2^"(^-^). 



N rnAt N AAt 



4E r E / E{asat}6ni6it-s)e-^'<'^-'^)dsdt. 

^ J (n-l)At ^ J{l~l)At 



In particular we have that 



^1 F.{al]ds + (At2) := + O (At^ 



N 



+ O (At^) , 



where the error terms are due to the Riemann approximation to an integral and thus it follows that 



N-l 

E4 

fc=0 



{X) 
k,k 



T 



E{a2}ds + 0(At). 



(10) 



(11) 



a\ does not depend on the value of k but is constant irrespectively of the value of k. Malliavin and Mancino [21] 
in contrast under very light assumptions show how the Fourier coefficients of {(t|} can be calculated from the 
Fourier coefficients of dXt, using a Parseval-Rayleigh relationship, see also [22], [30]. We can from (ITOl ) make a 
stronger link from the Fourier transform to the integrated volatility than that of the Parseval-Rayleigh relationship, 
and shall use this 'uniformity of energy' to estimate the microstructure bias. 
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We note that the covariance between different frequencies is given by: 



-y 



N 
1 

N 



n=l 
•T 



(ra-l)At 



Jo 



Let := Efrfe"^*'^-^* dt. We can bound the size of — ^) as \ki — k2\ increases. As E {ct|} is smooth 
in t the modulus of the covariance can be bounded for increasing \ki — k2\, 3& the Fourier transform — ^) 
decays proportionally to \ki — A;2|~"~^ where a is the number of smooth derivatives of E {c^ }. We can also directly 
note that the variance of the discrete Fourier transform of the noise is precisely (this is not a large sample result) 



(12) 



by virtue of being the first difference of white noise (see also [4]). The naive estimator can therefore be rewritten 



as, with5(^)^ = cov{4;^4:^} 



- — - (fe) 



N 



N-l 

E 

fc=0 



J, 



^k,k 



iy)\ 



E 



{") 

{X,X)^ 



I4 

N-l 



5]5W + 0(log(Ai) 

A;=0 
N-l 

k=0 



At) +0{At) 



k ■ 



(13a) 

(13b) 
(13c) 

(13d) 



The Parseval-Rayleigh relationship in (I13ab is discussed in [22], and is used in [21]. We shall now develop a 
frequency domain specification of the bias of the naive estimator. 

Lemma 1: (Frequency Domain Bias of the Naive Estimator) Let Xt be an Ito process and assume that the 
covariance of J^^ and J^^ to be Sj^^ with the chosen sampling. Then the naive estimator of the integrated 
volatility given by ( [T3l ) has an expectation given by: 



E 



(b) 



Af-1 

E 

k=0 



5f,^)+a,2|2sin(7r/fcAt)|2 



+ log(Ai)VAt 



(14) 



E 



— - (") 

{x,xy^' 



N-l 



+ ^ ^2 |2sin(7r/fcAt)p + O (log{At)VAi 

k=0 

0{1) + 0{At-^) + O flog(At)\/At 



Proof: This result follows from the independence of {et} and {Xt}, combined with (ITTI ) and ([T2l) . ■ 
We notice directly from (fT4l) that the relative frequency contribution of AXt and et, i.e. ^ compared to 

the noise contribution |2 sin(7r/fcAt)|^ determines the inherent bias of {X,X)'j! . Estimator ([T3]) is inconsistent 
and biased since it is equivalent to estimator and such a procedure would give an unbiased estimator of the 
integrated volatility only when a1 = 0. When the estimator is expressed in the time domain the microstructure 
cannot be disentangled from the Ito process. On the other hand in the frequency domain, from the very nature of 

(Y) 



a multiscale process, the contributions to Sf, ^ can be disentangled. 
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B. Multiscale Modelling 

To correct the biased estimator we need to correct the usage of the biased estimator of 5^ ^ , 5^ ^ , at each 
frequency. We therefore define a new shrinkage estimator [33, p. 155] of S^. ^ by 



(15) 



^ Lk < I is referred to as the multiscale ratio and its optimal form for perfect bias correction is for an arbitrary 
Ito process given by 



AX) 

^k,k 



5(J+a||2sin(7r/,At)|2' 



(16) 



This quantity cannot be calculated without explicit knowledge of S^^^ and ct^. We can however use (fTOl l to srmpUfy 
([T6l ) to obtain 



-=2 



a 



X 



For a fixed < Li. < 1 



a2, + a2|2sin(7r/fcAt)| 
E{5lf(L.)} 



(17) 



-2 



J, 



{Y) 



^x + O 



'At3/2\ 



k 



where the order terms follow from the continuity of /x^. We can define a new estimator for the true via: 

N-l 



— (m.3) 

{X,X)t 



k=0 



where 



(mi) 

E{{X,X)t 



{X,X)T + 0(log{At)VAt] . 



Recall that {X,X)t = 0{T) = 0(1). Consequently, to leading order we can remove the bias from the naive 
estimator if we know the multiscale ratio. We shall now develop a multiscale understanding of the process under 
observation and use this to construct an estimator for the multiscale ratio. 



C. Estimation of the Multiscale Ratio 

We have a two-parameter description on how the energy should be adjusted at each frequency. We only need 
to determine estimators of cr = (a^, cr^). We propose to implement the estimation using the Whittle likelihood 
methods (see [3] or [34], [35]). For a time-domain sample AY = (Alt^, . . . , AYi„) that is stationary, if suitable 
conditions are satisfied, see for example [8], then the Whittle likeUhood approximates the time domain likelihood, 
with improving approximation as the sample size increases. It is possible to show a number of suitable properties 
of estimators based on the Whittle likelihood, see [34], [35]. For processes that are not stationary, such conditions 
are in general not met, and so the function can be used as an objective function to construct estimators, but not as 
a true likelihood. The Whittle log-likelihood is defined [34], [35] by 



m = log 



Af/2-: 

n 

fc=i 



1 



5, 



kk 



N/2-1 

E 

fc=i 



log (4f +af |2sin(7r/fcAt)|' 



N/2-1 

E 

fc=i 



^kk 



AX) 
^kk 



a||2sin(^/fcAi)r 



If {AX(} is not stationary, then as long as the total contributions of the covariance of the incremental process can 
be bounded, using this likelihood will asymptotically (in At^^) produce suitable estimators, as we shall discuss 
further 
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Definition 2.1: (Multiscale Energy Likelihood) 
The multiscale energy log-likelihood is then defined using (fTOt as: 



N/2-1 N/2-1 -(F) 



k=l 



l{a) = - ^ log [aj, + .2 |2sin(^/,At)|2) - -2 ^ 2 |/-% , a.^I^ ' ^'^^ 



We stress that strictly speaking this may not be a (log-)lLkelihood, but merely a device for determining the multiscale 
ratio. We maximise this function in a to obtain a set of estimators a. 

Theorem 1: (The Estimated Multiscale Ratio) 
The estimated multiscale ratio is given by 



Lk = -1; — 9, (19) 

?2.+a||2sin(7rAAi)r 



where aj^ and maximise l{<j) given in ([TSl l. satisfies 

L 



^- = l + O (At^/^ j . (20) 

Proof: See Appendix lAl ■ 
Combining (fTSl) with ( fT9l ) the proposed estimator of the spectral density of {AXt} is: 

5i?(L,) = L,4P, (21) 

where is given by ( fT9l ). 

Theorem 2: (The Multiscale Estimator of the Integrated Volatility) 
Assume that AXt^ satisfies the conditions of Lemma [J and Theorem [T] The multiscale estimator of the integrated 
volatility defined by 

, . N-l 

iXx)^' =^si^\L,), (22) 

k=0 



where 5^^^(Lfc) is defined by (1211 ) has a mean and variance given by: 

r ( )) 

eUXx)t [ = + O (log(At)VAt) + O (Aii/4 

^ J I — n 



k=0 



and 



^ E [aj] + O (^log(At)^) + O (^At^/^ 



Proof: See Appendix iBl 
We also note that 



^ J fc=0 



< O = var I (X^X)^?^ | , (23) 

unless (jp = 0. We note that the multiscale estimator has lower variance than the naive method of moments estimator 

- — - (6) 

{X, X)rp due to the fact that < < 1. We have thus removed bias and simultaneously decreased the variance, 
the latter effect usually being the main purpose of shrinkage estimators. Note that if we knew the true multiscale 

ratio Lk and used it rather than Lk (i.e. used {X, X)rp ) then we would expect an estimator from this quantity to 
recover the same variance as the estimator based on the noise-free observations. This loss of efficiency is inevitable, 
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as we have to estimate L^. Finally we can also construct a Whittle estimator for the integrated volatility by starting 
from ([Tol l and taking 

=m\. (24) 

The sampling properties of (X, X) j, are found in Appendix |Al and aj^ is asymptotically unbiased. The results 
in Appendix |A] imply that 

var|(X;X)J"^| = T^T^ieri^. (25) 

We see that the variance depends on the length of the time course, the inverse of the signal to noise ratio, the square 
root of the sampling period and the fourth power of the "average standard deviation" of the Xt process.. We may 

— {'w) -— — ^ (mi) 

compare the variance of (1231 ) with the variance of (1251 ). to determine which estimator of (X, X)^ and {X, X)j, 
is preferable. We shall return to this question of relative performance in the examples section, but intuitively argue 

(ui) (mi) 

that {X,X)j, and {X,X)rp are more or less the same estimator, with the latter estimator being more intuitive 
to explain. 



D. Time Domain Understanding of the Method 

We may write the frequency domain estimator of the spectral density of AX^ in the time domain to clarify some 
of its properties. We define 



k=Q 

which when AXt is a stationary process corresponds to the estimated autocovariance sequence of AX^ using the 
method of moments estimator [5, Ch. 5]. We then have 

c(^) _ r , c(^) ^X) -S^f -^Y) (J.. 

u 

and so the estimated autocovariance of the AX^ process, namely , is a smoothed version of s-r We can 
therefore view as the Fourier transform of a smoothed version of the autocovariance sequence of Ay^. We let 

2 

ai^ + |2sm(7rjAtj| 

be the continuous analogue of L^.. To find the smoothing kernel we are using we need to calculate 

A;=0 

= r _2 ^f^. 21 n e^-^- # + (At) ■ (28) 
Thus utilizing integration in the complex plane (see Appendix [C]) we obtain that 



(29) 



These are both decreasing sequences in r. We write = ^ — ^ j . If we can additionally assume that L{f) 
decreases sufficiently rapidly to be near zero by / = ^ then we find that 
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Fig. 1. It as well as and Qt for a chosen value of the SNR (left). The approximate weighting functions perfectly mirror the exact 
calculation. We overlay a Gaussian kernel with the same spread for comparison. It estimated for the MA(6) case (right). 



In the limit of no observation noise ^ cxd) then this sequence becomes a delta function centered at r = 0. 
Let us plot these functions, i.e. ir, and for a chosen case of 'a\la\ k, 0.0331 (the approximate SNR used 
in a later example), in Figure [T] (left). We see that theory coincides very well with practise, and almost perfect 
agreement between the three functions. ^T■ is however a strange choice of kernel, if dictated by the statistical 
inference problem: it has heavier tails than the common choice of the Gaussian kernel, and is extremely peaked 
around zero (a Gaussian kernel with the same variance has been overlaid in Figure [B. This is not strange, as we are 
trying to filter out correlations due to non-Ito behaviour, but counter to our intuition about suitable kernel functions, 
as the differenced Ito process exhibits very little covariance at any lag but zero, the sharp peak at zero is necessary. 

E. Correlated Errors 

In many applications we need to consider correlated observation noise. We assume that despite being dependent 
the is a stationary time series. Stationary processes can be conveniently represented in terms of aggregations 
of uncorrelated white noise processes, using the Wold decomposition theorem [6] [p. 187]. We may therefore write 

the zero-mean observation e+ as 

"•J 

oo 

^t, =^0t,Vt,-t,, (30) 

fc=0 

where Bt^ = 1, J2j ^t, < oo, and {rjt,^} satisfies E{r]t,^} = and E{rit^rit^^} = a^5n,m, a model also used in [31]. 
Common practise would involve approximating the variable by a finite number of elements in the sum, and thus 
we truncate (l30b to some q ^ Z. We therefore model the noise as a Moving Average (MA) process specified by 

Et^ = r]t^ +^dt^r]t^_^, (31) 
k=l 

and the covariance of the DFT of the differenced et- process takes the form: 



c(£) _ _2 
^k,k - ^71 



k=l 



2sin(7r/At) 1^ (32) 



This leads to defining a new multiscale ratio replacing cr^ |2 sin (nfAt) 1 2 of (HI]) with ^2 1 1 + ^^^^ 0fce2^^/'= T 1 2 sin (vr/ At) 

(X) 

We then obtain a new estimator of 5^^, . In general the value of q is not known. To simultaneously implement 
model choice, we need to penalize the likelihood. We define the corrected Aikake information criterion (AICC) by 
[6, p. 303] (refer to ^ for /(cr,0) with o-2|2sin (vr/At) |2 replaced by cr^ |l + ^iLi ^^^^^'^^'^l^ |2sin (vr/At) p) 

AICC(6') = -2l{a,e) + 2^^^^. (33) 

n — p — 3 
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By minimizing this function, in a, 6 and q, we obtain the best fitting model for the noise accounting for overfitting 
by using the penalty term. With this method we retrieve a new multiplier that is applied in the Fourier domain, 
which corresponds to a new smoother in the Fourier domain, where the smoothing window (and its smoothing 
width) have been automatically chosen by the data. See an example of such a smoothing window l^- in in Figure 
[U (right). Here has been estimated from an Ito process immersed in an MA noise process. The spectrum of the 
MA has a trough at frequency 0.42. We therefore expect to reinforce oscillations at period 1/0.42 2.5, which is 
evident from the oscillations of the estimated kernel. For more details of this process see section IIII-EI 



III. Monte Carlo Studies 

In this section we demonstrate the performance of the multiscale estimator through Monte Carlo simulations. 
We first describe the de-biasing procedure of the estimator for the Heston Model using Fourier domain graphs. We 
then present bias, variance and mean square error results of various estimators (including the multiscale estimator, 
the naive estimator and the first-best estimator developed in [36]), for the Heston Model as well as Brownian 
and Omstein Uhlenbeck processes. We then consider the case where the sample path in a Heston Model is much 
shorter and another case where the microstructure noise is greatly reduced. Finally, we consider the case of correlated 
errors and show how a stationary noise process can be captured using model choice methods and then the integrated 
volatility can be estimated using the adjusted multiscale estimator. 



A. The Heston Model 

The Heston model is specified in [16]: 

dXt = {fi- vt/2) dt + atdBt, dvt = k (a - ut) dt + -^vl''^dWt, (34) 

where vt = a^, and Bt and Wt are correlated 1-D Brownian motions. We will use the same parameter values to 
the ones that were used in [36], namely /i = .05, k = 5, a = .04, 7 = .5 and the correlation coefficient between 
the two Brownian motions B and W is p = — .5. We set Xq = and uq = 0.04, which is the long time limit of 
the expectation of the process t'jQ 

We calculate S^^^ and 5^^^ directly from simulated data and average across realizations, producing Figure |2j 
where k is indicated by its frequency = k/N, and only plotted for A; = 0, . . . ,N/2 — 1, as the spectrum (or 
5^^^) is symmetric. We see directly from these plots that (on average as we showed) sj^^^ is constant whilst sj^^jj is 
strongly increasing with k, completely dwarfing the other spectrum at large k. (ITT] ) implies that an equal weighting 
is given to all frequencies for the differenced Ito process. The noise process will in contrast have a spectrum that 
is far from fiat, and a suitable bias correction would shrink the estimator of at higher frequencies. 

We also calculate sj,^^ and 5^^^ for one simulated path, displayed in Figure [3] Here we have used the same 
sample length T and noise intensity a"^ as in [36]: T = 1 day and a'^ = 0.0005^. The length of the sample 
path, r = 1 day or 23, 400s with At = Is, corresponds to one trading day, since we take one trading day to 
be 6.5/i long. Notice the different shape of the two periodograms. will not be distinguishable from 5^^^ at 
higher frequencies, despite the moderate to low intensity of the market microstructure noise. If we observed the 
two components Xt and et separately, then the multiscale ratio could be estimated from sj^^^ and 5^^^ using 
the method of moments formula. In this case, we would estimate by the sample Fourier Transform variances 

= ^^^^ 

'-'kk "T '-'kk 

The corresponding estimator of the integrated volatility becomes 

N-l 



{Xx)'t^'^ = Y,LkSg\ (36) 



k=0 

The estimated multiscale ratio L^, for the Heston model with the specified parameters, is plotted in Figure ID 

'limt^+oo Ei^t = Q. 
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Irequency frequency 

Fig. 2. iS^f"' (left) and iS^^' (right) averaged over 100,000 realizations. Note the different scaling of the y axis in the two figures. 




0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 

frequency frequency 



Fig. 3. A realisation of iSL (top left), a realisation of iSil (top right) with the Whittle estimates superimposed and of two biased corrected 
estimators of Sf.^ \ using LkSf^p, (bottom left) and LkSf^j^ (bottom right). Notice the different scales in the four figures. Estimated spectra 
are plotted on a linear scale for ease of comparison to the effect of applying Lk- 
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0.2 0.3 
frequency 



Fig. 4. The method of moments estimate Lk from a single realisation, with the Whittle estimate (white line) of Lk superimposed. 



The multiscale ratio cannot be estimated using the method of moments in realistic scenarios, as we only observe 
the aggregated process Yt and not the two processes Xt and et separately. Figure [3] displays the estimated multiscale 
ratio applied to SI J over one path realisation. This plot suggests that the energy over the high frequencies 



has been shrunk and that L^Sj^^ is a good approximation to S^^ 

summation of this function across frequencies should make a good approximation to the integrated volatility. 

and a1) are found separately for each path using the MATLAB function fmincon on (H 



It therefore seems not unreasonable that the 



The parameters (crj^ 



Figure [3] shows a\ and |2 sin(7r/fcAt)| (in white) plotted over the periodograms Sj^^ and S^^ for one simulated 
path. The approximated values of a\ and a1 are quite similar to the averaged periodograms of Figure [H in fact 
the accuracy of the new estimator depends on how consistently these parameters are estimated in the presence of 
limited information from the sampled process Yt. Figure |4] shows the corresponding estimated multiscale ratio 
(in white) from this simulated path, as defined in ( [T9l ). The function decays, as expected, so that it will remove the 
high-frequency microstructure noise in the spectrum of Yt, the ratio is also a good approximation of L^. Figure [3] 
shows LkSj^f^ , which is again similar to . It would appear that the new estimator has successfully removed 
the microstructure effect from each frequency. 

It is worth noting that the ratios and quantify the effect of the multiscale structure of the process. If 
a1 is zero (ie. there is no microstructure noise), then no correction will be made to the spectral density function 
(the ratio will equal 1 at all frequencies). So in the case of zero microstructure noise, the estimate would recover 
s\^'' and from (fT3l ) the estimate of the integrated volatility would simply be the realized integrated volatility of 
the observable process. 

We investigate the performance of the multiscale estimator using Monte Carlo simulations. In this study 50,000 
simulated paths are generated. Table U displays the results of our simulation, where biases, variances and errors are 
calculated using a Riemann sum approximation of the integral 



J, N 
1=1 



af dt. 



(37) 



(it) — — (™2) j_. 

The two estimators {X,X)j, and {X,X)rp (see ([5]) and ( 1361 ) respectively) are both included for comparison, 
even though these require use of the unobservable Xt process. The performance of the first-best estimator in [36] 

(si) 

(denoted by {X, X)j, ) is also included as a well-performing and tested estimator using only the Yt process, as is 

- (b) 

the naive estimator of the realized volatility on Yt at the highest frequency, {X,X)j, , given in ^ (the fifth-best 
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Sample bias 


Sample variance 


Sample RMSE 


' (6) 

(A, A)y 


1.17 X 10" 


2 


1.80 X 10" 


-8 


1.17 X 10" 


-2 


^ (si) 

(A, A)y 


6.44 X 10" 


7 


2.76 X 10" 


10 


1.66 X 10" 


'5 


(A, A)y 


2.90 X 10" 


7 


2.59 X 10" 


10 


1.61 X 10" 


-5 


~ (™) 


2.63 X 10" 


7 


2.59 X 10" 


10 


1.61 X 10" 


-5 


— — - imo^ 


1.39 X 10" 


8 


2.07 X 10" 


10 


1.44 X 10" 


-5 


(^>?' 


1.20 X 10" 


8 


2.06 X 10" 


10 


1.44 X 10" 


-5 



TABLE I 

Simulation study comparing the new estimator with the best estimator of [36]. 




(w) 

estimator in [36]). We also include the performance of (X, X)^ , defined in 

— ~ ("^i) — ' — — ('^i) 

Table U shows that the new estimator, (X, X)j, , is competitive with the first-best approach in [36], {X, X)j. , 

as an estimator of the integrated volatility for the Heston model with the stated parameters. For this simulation 

the new method performed marginally better. The similar performance of the two estimators is quite remarkable, 

given their different approach; both estimators involve a bias-correction, [36] perform this globally by weighting 

different sampling frequencies, whilst we correct locally at each frequency. The realized integrated volatility of Yt 

— ' — (fe) 

at the highest frequency, {X,X)j, , produces disastrous results, as expected. 

- — — ~ (mi) (w) 

We also note that {X,X)j, performs more or less identically to {X,X)j, . These two estimators can almost 
be used interchangeably due to the invariance property of a maximum likelihood estimator. This observation is 

bom out by our simulation studies, and we henceforth only report results for {X,X)rp . Note that the variance 

{w) 

of {X,X)rp can be found from $25i . To compare theory with simulations we note that the average estimated 
standard deviation is 1.6093 x 10^^ whilst the expression for the variance to leading order gives an expression for 

1/2 

= 1.0246 X 10~^, using the parameter values of ct^ 6.8 x 10~^ 



, M 

var<| (X,X)y 



the standard deviation of 
and fj^ w 2.5 x 10"'^. 

A histogram of the observed bias of the new estimator is plotted in Figure [5] along with a histogram of the 
observed bias of the first-best estimator in [36]. The observed bias of our estimator follows a Gaussian distribution 
centred at zero, suggesting that this estimator is unbiased, as out results claim to be true. Comparing our estimator 
to the first-best estimator, it can be seen that the new estimator has similar magnitudes of error also (hence the 
similar Root Mean Square Error (RMSE)). 
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xlO"' xlO"' 

Fig. 6. The histograms of the estimated Wx (a) and (b). 





Sample bias 


Sample variance 


Sample RMSE 


(6) 


1.17 X 10"^ 


1.77 X 10"* 


1.17 X 10"^ 


(si) 


6.52 X 10"^ 


2.68 X 10"" 


5.22 X lO"'^ 


— — (mi) 

{X,X)^ 


3.02 X 10"^ 


1.98 X 10"" 


4.46 X 10"® 


~ ("12) 


1.96 X 10"^ 


6.93 X 10"" 


8.32 X 10"^ 


— (u) 


3.79 X 10"^ 


5.44 X 10~" 


7.38 X 10"^ 



TABLE II 

Simulation study for the Brownian process. 



The new estimator requires calculation of a\ and which will vary over each process due to the limited 
information given from the Yt process. The stability of this estimation is of great importance if the estimator is to 
perform well. Figure [6] shows the distribution of the parameters a\ and a1 over the simulated paths. The parameter 
estimation is quite consistent, with all values estimated within a narrow range. Figure [2] suggests that these estimates 
are roughly unbiased; as 'a\ w 6.8 x 10^^ and g1 2.5 x 10^'' (as a1 12 sin(7r/A;)|^ ?s 1 x 10~^, at = 0.5). 

B. Brownian Process and Ornstein Uhlenbeck Process 

We repeated our simulations for a Brownian Process given by: 

dXt = yj^^dBt, (38) 

where erf = 0.01. We otherwise keep the same simulation setup as before with 50,000 simulated paths of length 

— — — (™i) 

23,400. The results are displayed in Table Ull The new estimator, (X, X)rp , again delivers a marked improvement 

(b) (si) 

on the naive estimator, (X, , and performs marginally better than the first-best estimator in [36], {X^X)^ . 
We also performed a Monte Carlo simulation for the Ornstein Uhlenbeck process given by: 

dXt = Xtdt + V^^tdBt, (39) 

where also = 0.01. Again we retain the same simulation setup and the results are displayed in Table |lll] The 
results are almost identical to that of the Brownian process, with the new estimator again outperforming other 
time-domain estimators. 
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Ssmplc bi^s 


Sample vsri^ncc 


odllipic JXiVloI-- 


' (6) 

(A, A)y 


i. i / X lU 


1. / o X lU 


i. i / X iU 


^ (si) 

(A, A)y 


D.o9 X 10 


^.DD X lU 


o.zU X iU 


(mi ) 


2.95 X 10"^ 


1.97 X 10"^^ 


4.44 X 10"^ 


— (m2) 


5.09 X 10-" 


6.76 X 10"^^ 


8.22 X 10"^ 


(") 


6.29 X 10"" 


5.33 X 10"^^ 


7.30 X 10"^ 



TABLE III 

Simulation study for the Ornstein Uhlenbeck process. 





Sample bias 


Sample variance 


Sample RMSE 


(fc) 


1.17 X 10"^ 


2.29 X 10~" 


1.17 X 10"^ 


(si) 


1.00 X lO"** 


4.51 X 10"^" 


2.13 X 10"^ 


- — ■ — - (mi) 


1.84 X 10"^ 


4.23 X 10""' 


2.06 X 10"^ 


— ("12) 


4.80 X 10"** 


2.42 X 10""' 


1.55 X 10"^ 


— - — (") 


5.27 X 10"* 


2.28 X 10"^" 


1.51 X 10"^ 



TABLE IV 

Simulation study for shorter sampler length. 



C. Comparing estimators over shorter sample lengths 

This section compares estimators for a shorter sample length which will reduce the benefit of subsampling due 
to the variance issues of small-length data but will also affect the variance of the multiscale ratio (c/ Theorem [T]). 

The simulation setup is exactly the same as before (using the Heston model with the same parameters) except 
that T, the simulation length, is reduced by a factor of 10 to 0.1 days or 2340s. Before the results of the simulation 
are reported, it is of interest to see whether the frequency domain methods developed still model each process 
accurately. Figure |7] shows the calculated a\ and ct^ |sin(7rAt//^.)|^ (in white) together with the periodograms 5^^^ 
and 5^,^^ for one simulated path. The estimator still approximates the energy structure of the processes accurately. 
Figure |7] also shows the corresponding estimate of the multiscale ratio (in white) from this simulated path 
(together with L^) and the corresponding plot of Lj^Sj.^ . The new estimator has removed the microstructure noise 
effect and has formed a good approximation of 5^^. . The approximation of the periodograms is still accurate 
despite the shortening of available data. 

Table [TVl displays the accuracy of the estimators over the 50,000 simulated paths. The first-best estimator in [36], 

— (si) — — (mi) 

{X,X)rp , and the new estimator, {X,X)rp , are once again comparable in performance and both estimates are 

(«) 

close to the best attainable RMSE given by, {X,X)j, , the realized integrated volatility on Xt. 



D. Comparing estimators with a low-noise process 

This section compares estimators for smaller levels of microstructure noise. Reducing the microstructure noise will 

— (si) 

reduce the need to subsample. The first-best estimator in [36], {X,X)rp , will have a higher sampling frequency 
and the new estimator will reduce its estimate of accordingly. For very small levels of noise, however, the 
first-best estimator will become zero, as the optimal number of samples becomes n (the highest available). This 
possibility is now examined, using the Heston model as before, with all parameters unchanged except the noise is 
reduced by a factor of 10, ie. = 0.00005^. Note that the path length is kept at its original length of T = 1 day. 

Figure [8] shows the estimates of a'j^ and |2 sin(7rAt/fc)|^ (in white) along with the periodograms and 
S^u for one simulated path along with the corresponding estimate of the multiscale ratio Lk (in white) (plotted 

~ ^ ^(Y) 

over the approximated L^.) and the corresponding plot of L^S^^ . The estimation method works well again; notice 
how the magnitude of the microstructure noise has been greatly reduced (the scale is now of order 10"^ rather 
than 10^^) causing the multiscale ratio Lj^ to be more tempered across the high frequencies than it was before, due 
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Fig. 7. A realisation of 5^^' (top left), a realisation of <S^^' (top right) with the Whittle estimates superimposed, the estimate of (bottom 
left) with the Whittle estimate of Lk superimposed and the biased corrected estimator of Sj.,. using LkSj.^ (bottom right). Notice the 
different scales in the four figures. 



to the smaller microstructure noise. Nonetheless, the new estimator has still detected the smaller levels of noise in 
the data. 

Table |V] reports on the results of 50,000 simulations performed as before. The first-best estimator of [36], 

(si) 

{X,X)rp , categorically failed for this model. This is due to the fact that the optimal number of samples was 
always equal to n, the total number of samples available. Therefore, the first-best estimator was always zero. The 

second-best estimator in [36], denoted by {X,X)j, , was reasonably effective. This is simply an estimator that 
averages estimates calculated from sub-sampled paths at different starting points and is therefore asymptotically 

— - — ■ — (mi) 

biased. The new estimator, {X,X)rp , was remarkably robust, with RMSE very close to the RMSE of estimators 
based on the Xt process. The difference in performance between estimators using Yt and estimators using Xt is 
expected to become smaller with less microstructure noise and this can be seen by the similar order RMSE errors 
between all estimators. Nevertheless, the new estimator was much closer in performance to the realized integrated 
volatility on Xt than it was to any other estimator on y^, a result that demonstrates the precision and robustness 
of this new estimator of integrated volatility. 

E. Correlated Noise 

In this section we consider microstructure noise that is correlated. If this process is stationary, the noise process can 
be modelled as an MA process (as described in Section III-EI ). and the corresponding parameters can be estimated 
by maximising the multiscale Whittle likelihood using dTT] ) and (l32l ). Figure |9] shows the multiscale estimator 



STAT. SCI. REPORT 290/STATISTICS SECTION REPORT TR-08-01 



17 




0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 

frequency frequency 



Fig. 8. A realisation of 5^^' (top left), a realisation of sj.^} (top right) with the Whittle estimates superimposed, the estimate of (bottom 
left) with the Whittle estimate of Lk superimposed and the biased corrected estimator of Sj.,. using LkSj.^ (bottom right). Notice the 
different scales in the four figures. 





Sample bias 


Sample variance 


Sample RMSE 


— - w 


1.17 X 10""* 


2.11 X 10""' 


1.18 X 10"* 


' (S2) 


3.53 X 10~^ 


1.00 X 10"" 


3.19 X 10"^ 


— — (mi) 


7.63 X 10"" 


2.12 X 10"^" 


1.46 X 10"^ 


— (m2) 


7.91 X 10"" 


2.06 X 10"^" 


1.44 X 10"^ 


(") 


9.83 X 10"" 


2.05 X 10"^" 


1.43 X 10"^ 



TABLE V 

Simulation study for lower market microstructure noise. 



applied to the Heston Model (with the same parameters as before) with a microstructure noise that follows an 
MA(6) process (parameters given in the caption). The Whittle estimates (in white) form a good approximation of 
5^^^ and 5^^^ despite the more complicated nuisance structure. The corresponding estimate of the multiscale ratio 
Lfc (in white) therefore removes energy from the correct frequencies and the con^esponding plot of L}^S\^ is a 
good approximation of S).^ . This is the same noise process and Ito process for which we calculated the optimal 
smoothing window in section HFeI and the trough in the noise at about / = 0.42 corresponds to the oscillations in 
the kernel plotted in Figure [T] 

If the length of the MA(p) process is unknown, then p can be determined using ( [33l ). In Table IVll we show an 
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Fig. 9. A realisation of 5^^' (top left), a realisation of <S^^' (top right) with the Whittle estimates superimposed, the estimate of (bottom 
left) with the Whittle estimate of Lk superimposed and the biased corrected estimator of <S^.^ using LkSj.^ (bottom right). In this example 
we use an MA(6) with 9i = 0.5, 02 = —0.1, 9s = —0.1, 64 = 0.2, 65 = and 6*6 = 0.4. Notice the different scales in the four figures. 



MA(p) 


01 


02 


03 


04 


05 


^6 


07 


6*8 


AICC 


p=l 


0.935 
















-3.208490 X 10" 


p^2 


0.624 


-0.445 














-3.239947 x 10" 


p = 3 


0.658 


-0.459 


-0.046 












-3.240000 X 10" 


p = 4 


0.806 


-0.603 


-0.101 


0.410 










-3.262427 x 10" 


p = 5 


0.813 


-0.606 


-0.101 


0.411 


-0.008 








-3.262416 X 10" 


p = 6 


0.815 


-0.604 


-0.097 


0.420 


-0.003 


0.000 






-3.262409 X lO" 


P^7 


0.807 


-0.613 


-0.114 


0.413 


0.002 


-0.002 


-0.005 




-3.262402 X lO" 


P^8 


0.817 


-0.614 


-0.128 


0.427 


0.005 


0.011 


-0.009 


-0.017 


-3.262384 x lO" 



TABLE VI 

Values of found by modelling the noise process as an MA(p) process for p = 1, . . . , 8. Model choice methods 

(AICC) ARE USED TO SELECT WHICH PROCESS TO MODEL THE NOISE BY, IN THIS CASE THE AICC IS MINIMISED BY SELECTING AN 
MA(4) with the given PARAMETERS. THE TRUE NOISE IS INDEED AN MA(4) PROCESS (WITH PARAMATERS 0i = 0.8, 02 = -0.6, 

03 = 0.1, 04 = 0.4). 



example with p = 4 with paramaters 9i = 0.8, 62 = —0.6, 6*3 = 0.1, 6*4 = 0.4, Clearly p = 4 is identified as the 
best fitting model yielding near to perfect estimates of the noise parameters. The estimator is therefore robust to 
removing the effect of microstructure noise when this process is correlated (and stationary), even if the length of 
the MA{p) process is not explicitly known. 

We also tested our estimator using Monte Carlo simulations in [31] for a variety of MA(1) processes and the 



STAT. SCI. REPORT 290/STATISTICS SECTION REPORT TR-08-01 



19 



results showed a significant reduction in error compared with not only the naive estimator, but also the estimators 
based on a white-noise assumption. Furthermore, the adjusted multiscale estimator performed almost identically to 
our multiscale estimator when we set 6i = and recovered a white-noise process, meaning the loss in precision 
from seaixhing for a parameter unnecessarily was negligible (as to be expected for q <C A^). Notice also that in 
Table |Vl] there appears to be little loss in precision from estimating more parameters in the MA (4) process then 
is required as Op for p > 4 is always estimated to be very close to zero. This further demonstrates the robustness 
and precision of our estimation technique. 

IV. Conclusions 

The problem of estimating the integrated stochastic volatility of an Ito process from noisy observations was 
studied in this paper. Unlike most previous works on this problem, see [26], [36], the method for estimating the 
integrated volatility developed in this paper is based on the frequency domain representation of both the Ito process 
and the noisy observations. The integrated volatility can be represented as a summation of variation in the process 
of interest over all frequencies (or scales). In our estimator we adjust the raw sample variance at each frequency. 
Such an estimator is truly multiscale, as it corrects the estimated energy directly at every scale. In other words, the 
estimator is debiased locally at each frequency, rather than globally. 

To estimate the degree of scale separation in the data we used the Whittle likelihood, and quantified the noise 
contribution by the multiscale ratio. Various properties of the multiscale estimator were determined, see Theorems [T] 
and|2] As was illustrated by the set of examples, our estimator performs extremely well on data simulated from the 
Heston model, and is competitive with the methods proposed by [36], under varying signal-to-noise and sampling 
scenarios. The proposed estimator is truly multiscale in nature and adapts automatically to the degree of noise 
contamination of the data, a clear strength. It is also easily implemented and computationally efficient. 

The new estimator for the integrated stochastic volatility can be written as 

u k 

where the kernel is given by ( |28] ). We can compare this estimator with kernel estimators, see [10]. There the 
estimated increment square is locally smoothed to estimate the diffusion coefficient using a kernel function, 

K{-). Contrary to this approach we estimate the integrated volatility by smoothing the estimated autoco variance 
of /S.Xt.. In particular, we use a data-dependent choice of smoothing window. We show that, from a minimum 
bias perspective, using a Laplace window to smooth is optimal. This data-dependent choice of smoothing window 
becomes more interesting after relaxing the assumptions on the noise process, and treating correlated observation 
error. 

Inference procedures implemented in the frequency domain are still very underdeveloped for problems with a 
multiscale structure. The modern data deluge has caused an excess of high frequency observations in a number 
of application areas, for example finance and molecular dynamics. More flexible models could also be used for 
the high frequency nuisance structure. In this paper we have introduced a new frequency domain based estimator 
and applied it to a relatively simple problem, namely the estimation of the integrated stochastic volatility, for data 
contaminated by high frequency noise. There are many extensions and potential applications of the new estimator. 
Here we list a few which seem interesting to us and which are currently under investigation. 

• Study parameter estimation for noisily observed SDEs which are driven by more general noise processes, for 
example Levy processes. 

• AppUcation of the new estimator to the problem of statistical inference for fast/slow systems of SDEs, of the 
type studied in [24], [26]. 

• Study the combined effects of high-frequency and multiscale structure in the data. A first step in this direction 
was taken in [7]. 
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A. Proof OF THEOREM [H 

Let the true value of a be denoted a* . We differentiate the multiscale energy likelihood function (fTSl ) with 
respect to a to obtain 

aot \ N/2-1 wy) 



a2, + a2|2sin(^A,At)p ^ (a^, + ct| |2 sin(7rAAt)|2 



a2, + a||2sin(vrAAt)|^ .tt fa^ + a| |2 sin(vrAAt)| 
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To remove implicit At dependence we let tx = a'j^/At, and denote derivatives with respect to tx by subscript r. 
Then ir{S') = Atlxio"), and so on. We calculate the expectation and variance of the score functions evaluated at 
a*, and find that the bias of tx is order O (At^/^ log(At)) and the bias of a"^ is order O (At^ log(At)) . These 
contributions become negligible, and are of lesser importance compared to the variance. 

To show large sample properties we Taylor expand the multiscale likelihood with a corresponding to the estimated 
maximum likelihood, and cr' is lying between a and cr*. Then 

UB) = leicT*) + IsrW) \5\ - af\ /At + £,,{a') [a^ - 



We note with the observed Fisher information 



-icr') le{cr') 



that 



{al - af)/At\ 
al-af ) 



-1-1 



4(?)-4K) 



(A-40) 



We henceforth ignore the term J, 



J, 



~(X) (X] 

Jf, as this will not contribute to leading order, and write Jf, 



where formally we would write J^"* or We can observe the suitability of this directly from ( [18] ) and use 

bounds for where we could formally apply these to get bounds on each derivative of l{a) (note that we cannot 
differentiate bounds). To avoid needless technicaUties, the details of this approach will not be reported. To leading 
order 



var ^i^((T) 
var (^ie{cr) 
cov(4(t),4(t) 



N/2-l N/2-1 

E E 

;=1 fc=i 



■'II 



aj, + |2sin(^A.At)|"j [a], + a| |2sin(7r/,At)| 
|2sin(7r/fcAi)|' \2sm[n fiAt)\^ cov (sg\s^P 



N/2-1 N/2-1 

E E 

i=i fc=i + a| |2sin(^/fcAi)|2)' {a\ + ct| |2 sin(7r/, Ai)| 



N/2-1 N/2-1 

E E 



At\2sin{nfiAt)\\oy {sg\s\^^) 



1=1 k=i a 



—2 



g1 |2sin(^/fcAi)|2) {a\ + a| |2 sin(7r/, Ai)f 



We now need to calculate cov ( , 



which is 



E 



E 



jy) qiY) AY) 
rkl ^kk "^U ■ 



(A-41) 



Furthermore 



-E{jpijpr}E{ijrrjr} 



We therefore need to calculate the individual terms of this expression. We note 



(eh2 
kk i ' 



AX) As)* 
"^kl "^kl 



AX)* As) 
"^kl "^kl 



Then it follows 



(A-42) 
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We therefore only need to worry about cov \ 5^^^ , sj^"^ | 



We need 

N rnAt 



1 I fTl/\T. 

N fpAt ^ f-mAt ^ fwAt 

p=l-^(p-l)Ai „=i^{m-l)At „,=i-^{"'-l)At 

^ JV Af AT AT 

=• ]V2 ^ 5^ 5Z E (efcnefcpel„e£pE(M„MpM^M^ 



n=l p=l m=l p=l 



where M„ := J(^n-i)At'^s"'^'' ^^'^ '■~ ^ " ■ Since Brownian motion has independent increments, we have 



that 'E[MnMpMmMpj = EM^ if n = p = m = p, E(M„MfcMmMp) = EM^EM^ if n = k, m = p and 
e( MnMpMmMp) = 0, otherwise. Consequently, 



E 



n=l \n=l / 



N N 



n=l p=l 
^ N N 

+ iV2 Yl Y ekne}pelpeen EM^ EM^ 



n=l p=l 

We use standard bounds on moments of stochastic integrals [19] to obtain the bound 

Ar2 



1 _ ^ 1 pnist 

^Y ^<^N2Y 36^* / ds < CiAtf, 

since, by assumption, Efi^ = ©(ijE We have: 



n=l 



(X) AX) AX) 

Pkl '^kk '^11 



E|J, 
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Ar2 
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^j(X)^.^jiX)^.jiX)^ 
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T rT 



JO 



cos(2vr(A: + /)(^)) + cos(2vr(A; - /)(^)) 



xE {cj2} E {al} dsdu + ©((At)^) 

^ r rE{a2}E{a2} (e2-('=+'){^)+e-2-('=+0{^) 
^^2Mk~i)i^) ^ g-2m(fc-/)(^)^ ^g^^ ^ 0((At)3) 

^(.,-i±i)E(^).E,i±i)E(-^±l) 



Since Efrf is a smooth function of time we can bound the decay of oc j so that: 
in this paper denotes a generic constant, rather than the same constant. 



(A-43) 
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We combine the foregoing calculations with (IA-42b 



var • 



Af/2-l N/2-1 



var 
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We note that 



SiY) AY)\ _ (X) AX) AX) 



Thus it follows that: 



var 
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Pkl '^kk '^11 +^kl 
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The extra order terms acknowledge potential effects from the drift. We need to establish the size of C. Using (|A-42I) 
we find that: 



\C\ < 



At*C2{{k + l)-' + {k-l)-') 
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= 0(log(At)). 

This is negligible in size compared to At^^/^. Similar calculations can bound contributions from the off diagonals 
in the other two calculations. Also as ct^ = TxAt 



N/2-1 

-e[£M^)]= E 



Af 



0(log(At)) = 0{At 
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(A-46) 
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The order terms follow from usual spectral theory on the white noise process, as well as bounds on Jff ■ We can 
also by considering the variance of the observed Fisher information deduce that renormalized versions of the entries 
of the observed Fisher information converge in probability to a constant, or 

diag(A^^/^ A^l/2)Fdiag(A^l/^ Ati/2) 



and thus using Slutsky's theorem we can deduce that: 



diag(A^-l/^Arl/2 



^ diag(A^-l/^ At-V2) N (0, JT-i) 
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where the entries of :F can be found from ( IA-461 ). (IA-441 ) and (IA-45K and 
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We have 
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This expression follows by direct calculation. Asymptotic normality of both % and a"^ follows by the usual 

(w) 

arguments. We can determine the asymptotic variance of {X, X) via 



var <^ (X, X 



(w) 



T^var {tx} 
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igt'^VaI. 



(A-48) 



We see that the variance depends on the length of the time course, the inverse of the signal to noise ratio, the 
square root of the sampling period and the fourth power of the "average standard deviation" of the Xt process. 



B. Proof OF Theorem [2] 

We now wish to use these results to deduce properties of a. Firstly using the well known invariance of maximum 
likelihood estimators to transfer the estimators oia\ and to estimators of {X,X)t- We therefore take 



, , N-l 
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k=0 k=0 

It therefore follows that with tx = tx + 5tx and = + ScjI 
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This implies that the estimator is asymptotically unbiased. We can also note that the variance of the new estimator 
is given by: 

(mi ) 

var <; {X,X)^ 
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By looking at the individual terms of this expression, and noting that the estimated renormalized variance tx = 
Tx + 5tx and = + are linear combinations of s9[\ we can deduce the stated order terms, by again 



noting the y At order of the important terms. However to leading order, this estimator will perform identically to 

~ (uj) 

{X, X) in terms of variance. 



C. Proof of Time Domain Form 

The integral can be calculated from first principles using complex- variables with z = e^*'^-^. Thus dz/df = 2i-Kz 
or df = dz/{2mz). (|28] ) takes the form 
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If 



< 1 we have 
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We then note that: 
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If on the other hand you consider 
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> 1 which in many scenarios is more reaUstic then we find that: 



In this case we find that 
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In both cases the decay of the filter is geometric. We note that in most practical examples decays very rapidly 
in A;. Therefore, we do not need to integrate between —1/2 to 1/2, and only need to integrate over — I/tt to I/tt. 
In this range of / we find that for smalUsh remainder term i?3 we have: sin2(7r/) = Tr^f^ + R3{fTr). Then we note 



1 ^2 +4(7f7r2/2 + i?3(/;r) 



df + C 



crx 

2(7, J_r 
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+ i?4(/) 



ax -^2dri 



Thus we are smoothing the autocovariance sequence with a smoothing window that becomes a delta function as 
ax/oe 00. It is reasonable that this non-dimensional quantity arises as an important factor. 



